首页> 外文OA文献 >An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve Replicability of Cluster Assignments for Mapping Application
【2h】

An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve Replicability of Cluster Assignments for Mapping Application

机译:一种用于K-means聚类的初始种子选择算法   地理参考数据提高集群分配的可复制性   制图应用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

K-means is one of the most widely used clustering algorithms in variousdisciplines, especially for large datasets. However the method is known to behighly sensitive to initial seed selection of cluster centers. K-means++ hasbeen proposed to overcome this problem and has been shown to have betteraccuracy and computational efficiency than k-means. In many clustering problemsthough -such as when classifying georeferenced data for mapping applications-standardization of clustering methodology, specifically, the ability to arriveat the same cluster assignment for every run of the method i.e. replicabilityof the methodology, may be of greater significance than any perceived measureof accuracy, especially when the solution is known to be non-unique, as in thecase of k-means clustering. Here we propose a simple initial seed selectionalgorithm for k-means clustering along one attribute that draws initial clusterboundaries along the 'deepest valleys' or greatest gaps in dataset. Thus, itincorporates a measure to maximize distance between consecutive cluster centerswhich augments the conventional k-means optimization for minimum distancebetween cluster center and cluster members. Unlike existing initializationmethods, no additional parameters or degrees of freedom are introduced to theclustering algorithm. This improves the replicability of cluster assignments byas much as 100% over k-means and k-means++, virtually reducing the varianceover different runs to zero, without introducing any additional parameters tothe clustering process. Further, the proposed method is more computationallyefficient than k-means++ and in some cases, more accurate.
机译:K-means是各种学科中使用最广泛的聚类算法之一,尤其是对于大型数据集。但是,已知该方法对簇中心的初始种子选择高度敏感。已经提出了K-means ++来克服这个问题,并且已经显示出比k-means具有更好的准确性和计算效率。尽管在许多聚类问题中-例如在对地理参考数据进行分类以进行制图应用时-聚类方法的标准化,特别是,对于该方法的每次运行都达到相同的聚类分配的能力,即该方法的可复制性,可能比任何可感知的测量方法具有更大的意义。精度,尤其是在已知解非唯一的情况下,例如在k均值聚类的情况下。在这里,我们为沿一个属性的k均值聚类提出了一个简单的初始种子选择算法,该算法沿“最深的谷底”或数据集中的最大缺口绘制了初始聚类边界。因此,它采用了一种措施来最大化连续的聚类中心之间的距离,这增加了常规的k均值优化,以使聚类中心和聚类成员之间的距离最小。与现有的初始化方法不同,聚类算法没有引入其他参数或自由度。与k-means和k-means ++相比,这将群集分配的可复制性提高了100%,实际上将不同运行之间的方差降低到零,而无需在群集过程中引入任何其他参数。此外,所提出的方法比k-means ++具有更高的计算效率,并且在某些情况下更准确。

著录项

  • 作者

    Khan, Fouad;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号